Ontology-Based XQuery'ing of XML-Encoded Language Resources on Multiple Annotation Layers
نویسندگان
چکیده
We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be established using the method of ontology-based query expansion. In addition, we present a highly flexible web-based graphical interface that can be used to query corpora with regard to several different linguistic properties such as, for example, syntactic tree fragments. This interface can also be used for ontology-based querying of multiple corpora simultaneously.
منابع مشابه
Multidimensional markup and heterogeneous linguistic resources
The paper discusses two topics: firstly an approach of using multiple layers of annotation is sketched out. Regarding the XML representation this approach is similar to standoff annotation. A second topic is the use of heterogeneous linguistic resources (e.g., XML annotated documents, taggers, lexical nets) as a source for semiautomatic multi-dimensional markup to resolve typical linguistic iss...
متن کاملCross Document Annotation for Multimedia Retrieval
This paper describes the MUMIS project, which applies ontology based Information Extraction to improve the results of Information Retrieval in multimedia archives. The domain specific ontology, the multilingual lexicons and the information passed between the different processing modules are all encoded in XML. The innovative aspect is the use of a cross document merging algorithm that uses the ...
متن کاملA Multi-Layered, XML-Based Approach to the Integration of Linguis- tic and Semantic Annotations
In this paper we present a multi-layered approach to document annotation that allows for the structural integration of linguistic and semantic annotations produced by various language technology tools and using knowledge encoded in different domain ontologies as needed for semantic web applications.
متن کاملTEITOK: Text-Faithful Annotated Corpora
TEITOK is a web-based framework for corpus creation, annotation, and distribution, that combines textual and linguistic annotation within a single TEI based XML document. TEITOK provides several built-in NLP tools to automatically (pre)process texts, and is highly customizable. It features multiple orthographic transcription layers, and a wide range of user-defined token-based annotations. For ...
متن کاملExploring XML-based technologies and procedures for quality evaluation from a real-life case perspective
The use of Extensible Markup Language (XML) for the annotation of Spoken Language Resources (SLR) is becoming increasingly common these days. Therefore the Speech Processing EXpertise centre (SPEX), which is the SLR validation centre of the European Language Resources Association (ELRA), is also being confronted more with XML. The project “Lexica and Corpora for Speech-to-Speech Translation Com...
متن کامل